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Amendments to the Claims : 

The following listing of claims will replace all prior versions, and listings, of claims in 
the application: 

1 . (Currently Amended) Method of encoding linguistic frequency data, the 
method comprising: 

mapping each character string occurring in a source text to a numeric identifier 

identifying the character string, 

identifying a plurality of sets of «_character strings in a source text, n being an 
integer number greater than 1, each set forming an w-gram comprising at least a first and a 
second successive character string, 

for each set, obtaining frequency data indicative of the frequency of the 
respective set in the source text, 

creating a memory array / for containing the frequency data, n-\ pointer 

memory arrays p? . . .p„ for containing pointers, and n offset positional arrays n -r„ for 
containing indexing offsets, 

for each character string that is a first character string in at least one of the sets, 
assigning a memory position in ariirs^memory array £to the respective character string and 
storing at said memory position the frequency data of each set comprising the respective 
character string as the first character string, and 

grouping the frequencies relating to a?- grams that have the same first character 

string into a block within the array f, 

for each character string that is a second character string in at least one of the 
sets, assigning a memory position in a s e cond p ointer memory array E2_to the respective 
character string and storing at said memory position, for each set comprising the respective 
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character string as the second character string, a pointer pointing to a memory position in the 

first-memory array ^assigned to the corresponding first character string of the respective set 

and having stored the frequency data of the respective set A 

grouping the pointers relating to w-grams that have the same second character 

string together as a block within the pointer memory array 

storing an offset position for each respective first character string in positional 

array n that indexes to the corresponding block in frequency array/ relating to the first 
character string, and 

storing an offset position for each respective i th character string in positional 

array n_ that indexes to the corresponding block in pointer array pi relating to the i th character 
string . 

2. (Original) The method of claim 1 wherein each set of character strings further 
comprises a third an i th character string, where i=3...n. and the method further comprises: 

for each character string that is a third an i th character string in at least one of 
the sets, assigning a memory position in a third an i th pointer memory arrays to the respective 
character string and storing at said memory position, for each set comprising the respective 
character string as thif dthe i th character string, a pointer pointing to a memory position in the 
s e cond memory array ^/.assigned to the corresponding s e cond /'*- 1 character string of the 
respective set and having stor e d a point e r pointing to th e fr e qu e ncy data of th e r e sp e ctiv e s e t . 

3. (Original) The method of claim 1 wherein each character string is a word of a 
natural language. 

4. (Canceled) 

5. (Currently Amended) The method of claim 4 claim 1 wherein n is equal to 3, 
the n-grams being tri grams. 
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6. (Original) The method of claim 1 wherein said frequency data indicative of 
the frequency of the respective set in the source text includes the number of occurrences of 
the respective set in the source text. 

7. (Original) The method of claim 1 wherein said frequency data indicative of 
the frequency of the respective set in the source text includes weight numbers of a maximum 
entropy model. 

8. (Original) The method of claim 1, further comprising: * 

mapping each character string occurring in the source text to a numeric 
identifier identifying the character string, by operating a finite-state machine. 

9. (Canceled) 

10. (Original) The method of claim 1, further comprising: 

accessing a hash-table for assigning a numeric identifier to each character 
string occurring in the source text. 

11. (Canceled) 

12. (Original) The method of claim 1, further comprising: 

in the second memory array, sorting the pointers relating to the same second 
character string, with respect to the memory positions of the first memory array to which the 
pointers point. 

13. (Original) The method of claim 1 wherein the pointers are stored in 
compressed form. 

14. (Currently Amended) Method of accessing encoded linguistic frequency data 
for retrieving the frequency of a search key in a text encoded according to the method of 
claim 1 , the search key comprising a first and a second search string, the encoded data being 
stored in a first memory array /storing frequency data and a s e cond memory array ^storing 
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pointers to the first memory array/ the frequency data being indicative of the frequencies of 

character sets in athe source text, the character sets each including at least two character 

strings, the method comprising: 

identifying a regie nblock in the first m e mory array /that is assigned to the first 

search string, 

identifying a regienblock in the second memory array ^_that is assigned to the 
second search string, 

identifying a pointer stored in the regienblock of the s e cond m e mory array p?. 
pointing to a memory position within the regien block of the first m e mory array/, and 

reading the frequency data stored at said memory position. 

15. (Currently Amended) The method of claim 14 wherein the search key further 
comprises a third search string and the encoded data is further stored in a third-memory array 
££_storing pointers to the second memory array p?* wherein the method further comprises: 

identifying a regie nblock in the third memory array jTjthat is assigned to the 
third search string, 

identifying a pointer stored in the regie nblock of the third m e mory array p* % 
pointing to a memory position within the regie nblock of the s e cond m e mory array pi* and 

tracing the pointer stored in the regie nblock of the third m e mory array j7?_back 
until the regie nblock of the first m e mory array X_is reached. 

16. (Original) The method of claim 14 wherein each character string is a word of 
a natural language. 

17. (Original) The method of claim 14 wherein each set of character strings 
comprising n character strings, n being an integer number greater than one, each set being an 
n-gram. 
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18. (Original) The method of claim 14 wherein n is equal to 3, the n-grams being 
trigrams. 

19. (Original) The method of claim 14 wherein said frequency data indicative of 
the frequency of the respective set in the source text includes the number of occurrences of 
the respective set in the source text. 

20. (Original) The method of claim 14 wherein said frequency data indicative of 
the frequency of the respective set in the source text includes weight numbers of a maximum 
entropy model. 

21. (Original) The method of claim 14 wherein identifying a pointer includes 
performing a binary search within the second memory array. 

22. (Currently Amended) The method of claim 14 wherein identifying a pointer 
includes identifying a sub-interval in the regienblock of the s e cond memory array p%> the sub- 
interval including at least two pointers pointing to a memory position within the regienblock 
of the first m e mory array f. 

23. (Original) The method of claim 14, wherein identifying a pointer includes 
performing a first binary search for a set of pairs of strings where the first string in each pair 
matches the first search string, performing a second binary search for a set of pairs of strings 
where the second string in each pair matches the second search string, and calculating an 
intersection of both sets. 

24. (Currently Amended) The method of claim 14, further comprising: 

if the character sets in the source text comprise more character strings than the 
search ke y and if potentially missing search strings exist , arranging the search strings in the 
search key such that the potentially missing search strings appear first. 
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25. (Currently Amended) A system for encoding linguistic frequency data, 

comprising: 

a processing unit for identifying a plurality of sets of character strings in a 
source text, each set comprising an /?-gram including at least a first and a second character 
string, and, for each set, obtaining frequency data indicative of the frequency of the respective 
set in the source text, and 

an encoder that , for e ach charact e r string that is a first charact e r string in at 
l e ast on e of the s e ts, assigns a m e mory position in a first m e mory array to th e r e sp e ctiv e 
charact e r string and stor e s at s aid m e mory position th e fr e qu e ncy data of e ach set comprising 
th e r e sp e ctiv e charact e r string as first charact e r string, and that, for e ach charact e r string that 
is a s e cond charact e r string in at l e ast on e of th e s e ts, assigns a m e mory position in a s e cond 
m e mory array to th e r e sp e ctiv e charact e r string and stor e s at said memory position, for each 
sot comprising th e respective character string as s e cond character string, a point e r pointing to 
a m e mory position in th e first m e mory array assign e d to th e corr e sponding first character 
string of the r e sp e ctive set and having stor e d th e fr e qu e ncy data of th e resp e ctive sot 

creates a memory array/ for containing the frequency data, n-l pointer 

memory arrays p? . . ,p» for containing pointers, and n offset positional arrays rr ~r„ for 
containing indexing offsets, 

for each character string that is a first character string in at least one of 

the sets, assigns a memory position in memory array/to the respective character string and 
stores at said memory position the frequency data of each set comprising the respective 
character string as the first character string, 

groups the frequencies relating to H-grams that have the same first 

character string into a block within the array f t 
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for each character string that is a second character string in at least one 

of the sets, assigns a memory position in pointer memory array P2 to the respective character 
string and stores at said memory position, for each set comprising the respective character 
string as second character string, a pointer pointing to a memory position in the memory array 
f assigned to the corresponding first character string of the respective set and having stored 
the frequency data of the respective set, 

groups the pointers relating to k- grams that have the same second 

character string together as a block within the pointer memory array z?^ 

stores an offset position for each respective first character string in 

positional array n that indexes to the corresponding block in frequency array/ relating to the 
first character string, and 

stores an offset position for each respective i th character string in 

positional array n that indexes to the corresponding block in pointer array pi relating to the I th 
character string . 

26. (Currently Amended) A system for accessing encoded linguistic frequency 
data encoded by the system of claim 25 for retrieving the frequency of a search key in a text, 
the search key comprising a first and a second search string, the encoded data being stored in 
a first m e mory an array/ storing frequency data and a s e cond m e mory at least one array p? 
storing pointers back to the first memory array/, the frequency data being indicative of the 
frequencies of character sets in a source text, the character sets each including at least two 
character strings, the system comprising: 

an input device for inputting the search key, and 

a search engine for identifying a regienblock in the first memory array /that is 
assigned to the first search string, identifying a fegienblock in the s e cond m e mory arrays 
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that is assigned to the second search string, identifying a pointer stored in the regienblock of 

the s e cond m e mory array p?* the pointer pointing to a memory position within the fegienMock 

of the first m e mory arrayjf, and reading the frequency data stored at said memory position. 

27. (New) Method of encoding linguistic frequency data, the method comprising: 

mapping each character string occurring in a source text to a numeric identifier 
identifying the character string, 

identifying a plurality of sets of n character strings in a source text, n being an 
integer number of at least 3, each set forming an w-gram comprising at least a first, a second 
and a third successive character string, 

for each set, obtaining frequency data indicative of the frequency of the 
respective set in the source text, 

creating a memory array/ for containing the frequency data, and n-l pointer 
memory arrays pi . . ,p n for containing pointers, and n offset positional arrays rj-r n for 
containing indexing offsets, 

for each character string that is a first character string in at least one of the sets, 
assigning a memory position in memory array / to the respective character string and storing 
at said memory position the frequency data of each set comprising the respective character 
string as the first character string, 

for each character string that is a second character string in at least one of the 
sets, assigning a memory position in pointer memory array p2 to the respective character 
string and storing at said memory position, for each set comprising the respective character 
string as the second character string, a pointer pointing to a memory position in the memory 
array f assigned to the corresponding first character string of the respective set and having 
stored the frequency data of the respective set, and 
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for each character string that is an i th character string in at least one of the sets, 
for i =3..n, assigning a memory position in an i th pointer memory array p t to the respective 
character string and storing at said memory position, for each set comprising the respective 
character string as the i th character string, a pointer pointing to a memory position in the 
memory array assigned to the corresponding i th -l character string of the respective set, 

wherein multiple pointers in the i th pointer memory array p x point to the same 
memory position within memory array p x .j and only one chain of memory positions within /, 
P2 . . .p n uniquely define each «-gram. 
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